This exercise is based on one from Unwin (2015), and uses the bomregions data from the DAAG package. The data contains regional rainfall for the years 1900-2008. The regional rainfall numbers are area-weighted averages for the respective regions. Extract just the rainfall columns from the data, along with year.
The total rainfall is divided by geographic area to get the rainfall on a scale that can be compared aross different sized regions.
ggduo function in the GGally package.)The rainfall patterns are fairly flat across regions. The temporal trend differs a little from one region to another.
EastRain and seRain and mdbRain are strongly correlated. NorthRain and the Australian totals are stroongly correlated. NorthRain and eastRain are moderately correlated. There are a few outliers (high average) in several regions, particularly the north. It suggests that there are sometimes heavy rain years in the north.
There doesn’t appear to be more negative differences in recent years. Although there is possibly a hint in several regions: swRain, seRain, mdbRain. There were several years of heavier than average rain in most regions in the early 1970s. Generally the pattern is a few wet years then a few dry years.
This exercise is based on and example in Oscar Perpinan Lamigueiro (2018) “Displaying Time Series, Spatial, and Space-Time Data with R”. Read the US employment data from the book web site. This contains monthly unemployment numbers from 2000 through 2012 in different sectors.
tsibble. Make a line plot coloured by sector. What do you learn about unemployment during this time frame from this chart?2008 is when unemployment rose in many sectors. In some sectors there is a strong seasonal pattern.
Some sectors, eg LNU03028615, LNU03032231 and LNU03035181 have a strong seasonal pattern. LNU03035181 and LNU03032237 appear to be less affected by the economic downfall.
geom_line with geom_area, to stack the series, with a different fill colour for each sector. What do you learn about the magnitude of the 2008 economic crisis? Can you read much from this chart about the effect on different sectors?The big increase in unemployed after 2008 is emphasised by this chart. It’s difficult to examine the individual sectors, though.
This is a similar type of plot called a “stream graph”. The streamgraph package generates this as an interactive plot, which is great for exploring multiple nested time series.
This is a classic data example: Annual numbers of lynx trappings for 1821–1934 in Canada, from Brockwell & Davis (1991). It is a classic because it looks periodic, but it really doesn’t have a period. Here we look at two ways to examine the cyclic nature to check for periodicity.
lynx_tsb <- as_tsibble(lynx) %>%
rename(count = value)
lynx_tsb <- lynx_tsb %>%
mutate(decade = round(index/10, 0),
yr_decade = index %% 10)
The orange vertical lines mark each decade, and the first one matches the first peak. For the first few decades the line matches the peak, but as time progresses the peak arrives a little earlier than the decade.
Snipping the series into 10 year blocks does not produce a matching of the peaks. Although it looks like a 10 year cycle, it appears to be a little less than that, and slightly irregular.
In Earo Wang’s blog post introducing tsibble she used NYC bikes data. This data is now made available in the tsibbledata package.
# Select May
hourly_trips <- nyc_bikes %>%
filter(month(start_time) == 5) %>%
index_by(start_hour = floor_date(start_time, unit = "1 hour")) %>%
summarise(ntrips = n()) %>%
as_tsibble()
Yes
## # A tibble: 1 x 1
## .gaps
## <lgl>
## 1 TRUE
fill_gaps to make implicit missings explicit. Re-make the line plot from question c again.There are a lot of gaps. It’s hard to see where they are because there are missings everywhere!
count_gaps function. How extensive are the missing values?The missing values are extensive.
There are more missings in peak traffic hours, and lunch times. This data is tracking 10 bikes over this time period. Its not clear what generates the missing values, but maybe times when no bikes are being used?
We saw in the lecture notes that imputing by simple method such as mean or moving average doesn’t work well with multiple seasonality in a time series. Here we will use a linear model to capture the seasonality and produce better imputations for the pedestrian sensor data (from the tsibble package).
The work day/non working day typically has a daily different pattern.
has_gaps(pedestrian, .full = TRUE)
## # A tibble: 4 x 2
## Sensor .gaps
## <chr> <lgl>
## 1 Birrarung Marr TRUE
## 2 Bourke Street Mall (North) TRUE
## 3 QV Market-Elizabeth St (West) TRUE
## 4 Southern Cross Station TRUE
ped_gaps <- pedestrian %>%
count_gaps(.full = TRUE)
ped_full <- pedestrian %>%
fill_gaps(.full = TRUE)
hol. Make hour a factor - this helps to make a simple model for a non-standard daily pattern.hol <- holiday_aus(2015:2016, state = "VIC")
ped_qvm <- ped_full %>%
filter(Sensor == "QV Market-Elizabeth St (West)") %>%
mutate(hol = is.weekend(Date)) %>%
mutate(hol = ifelse(Date %in% hol, TRUE, hol)) %>%
mutate(Date = as_date(Date_Time), Time = hour(Date_Time)) %>%
mutate(Time = factor(Time))
Time and hol interacted.ped_qvm_lm <- lm(Count~Time*hol, data=ped_qvm)
ped_qvm$pCount <- predict(ped_qvm_lm, ped_qvm)
This makes a much better imputed value. There’s still room for improvement but its better than a nearest neighbour, or mean or moving average imputation.
The heights data provided in the brolgar package contains average male heights in 144 countries from 1500-1989.
The time index is year, and key is country.
Its looking like Australian males are getting taller.
heights <- heights %>%
add_n_obs() %>%
filter(n_obs >= 5)
Generally, the trend is up, so yes it does look like men are getting taller acorss the globe.
facet_strata to break the data into subsets using the year, and plot is several facets. What sort of patterns are there in terms of the earliest year that a country appears in the data?The countries are pretty evenly distributed across the facets, which means that there are roughly similar numbers of countries regularly joining their data into the collection.
The average minimum height is about 164cm, median is about 168cm and tallest is about 172cm. The maximum height appears to be bimodal, with a small peak around 178cm.
Most countries have the expected pattern of increasing heights from minimum, median to maximum. There are a few which have very similar values of these, though, which is a bit surprising. It means that there has been no change in these metrics over time.
Denmark has the tallest men. Estonia has the tallest median height. Papua New Guinea has the shortest men. The distribution of heights over the years is not the same for each country.